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Coordinate Dual Averaging for Decentralized Online 
Optimization with Nonseparable Global Objectives* 

Soomin Lee, Angelia Nedic, and Maxim Raginsky 


Abstract —We consider a decentralized online convex optimiza¬ 
tion problem in a network of agents, where each agent controls 
only a coordinate (or a part) of the global decision vector. For 
such a problem, we propose two decentralized variants (ODA- 
C and ODA-PS) of Nesterov’s primal-dual algorithm with dual 
averaging. In ODA-C, to mitigate the disagreements on the 
primal-vector npdates, the agents implement a generalization of 
the local information-exchange dynamics recently proposed by Li 
and Marden (I] over a static undirected graph. In ODA-PS, the 
agents implement the broadcast-based push-sum dynamics 0 
over a time-varying seqnence of uniformly connected digraphs. 
We show that the regret bounds in both cases have snblinear 
growth of 0(Vt), with the time horizon T, when the stepslze 
is of the form l/y/i and the objective functions are Lipschitz- 
continuons convex functions with Lipschitz gradients. We also 
implement the proposed algorithms on a sensor network to 
complement our theoretical analysis. 


I. Introduction 

Decentralized optimization has recently been receiving sig¬ 
nificant attention due to the emergence of large-scale dis¬ 
tributed algorithms in machine learning, signal processing, 
and control applications for wireless communication networks, 
power networks, and sensor networks; see, for example, 0- 
A central generic problem in such applications is decen¬ 
tralized resource allocation for a multiagent system, where 
the agents collectively solve an optimization problem in the 
absence of full knowledge about the overall problem structure. 
In such settings, the agents are allowed to communicate 
to each other some relevant estimates so as to learn the 
information needed for an efficient global resource allocation. 
The decentralized structure of the problem is reflected in the 
agents’ local view of the underlying communication network, 
where each agent exchanges messages only with its neighbors. 

In recent literature on control and optimization, an exten¬ 
sively studied decentralized resource allocation problem is one 
where the system objective function /(x) is given as a sum 
of local objective functions, i.e., /(x) = X]r=i/»(^) where 
fi is known only to agent i; see, for example ||^-p5). In 
this case, the objective function is separable across the agents, 
but the agents are coupled through the resource allocation 
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vector X. Each agent maintains and updates its own copy 
of the allocation/decision vector x, while trying to estimate 
an optimal decision for the system problem. The vector x is 
assumed to lie in (a subset of) where d may or may not 
coincide with the number of agents n. 

Another decentralized resource allocation problem is the 
one where the system objective function /(x) may not ad¬ 
mit a natural decomposition of the form Er=i/i(^)’ 
the resource allocation vector x = (xi,...,a;„) G K" is 
distributed among the agents, where each agent i is responsible 
for maintaining and updating only a coordinate (or a part) 
Xi of the whole vector x. Such decentralized problems have 
been considered in ||2^-||^ (see also the textbook |[3T|). In 
the preceding work, decentralized approaches converge when 
the agents are using weighted averaging, or when certain 
contraction conditions are satisfied. Recently, Li and Marden 
0 have proposed a different algorithm with local updates, 
where each agent i keeps estimates for the variables Xj, j ^ i, 
that are controlled by all the other agents in the network. 
The convergence of this algorithm relies on some contraction 
properties of the iterates. Note that all the aforementioned 
algorithms were developed for offline optimization problems. 

Our work in this paper is motivated by the ideas of Li and 
Marden |[T| and also by the broadcast-based subgradient push, 
which was originally developed by Kempe et al. Q and later 
extended in and in lig, ig to distributed optimization. 
Specifically, we use the local information exchange model 
of Ul and 0, 0, 0, | |32| , but employ a different online 
decentralized algorithm motivated by the work of Nesterov 
p^ . We call these algorithms ODA-C (Online Dual Aver¬ 
aging with Circulation-based communication) and ODA-PS 
(Online Dual Averaging with Push-Sum based communica¬ 
tion), respectively. 

In contrast with existing methods, our algorithms have 
the following distinctive features; (1) We consider an online 
convex optimization problem with nondecomposable system 
objectives, which are functions of a distributed resource al¬ 
location vector. (2) In our algorithms, each agent maintains 
and updates its private estimate of the best global allocation 
vector at each time, but contributes only one coordinate to the 
network-wide decision vector. (3) We provide regret bounds 
in terms of the true global resource allocation vector x (rather 
than some estimate on x by a single agent). For both ODA-C 
and ODA-PS, we show that the regret has sublinear growth 
of order 0{\/T) in time T with the stepsize of the form 
1 / s/t -f 1. 

Our proposed algorithm ODA-PS is closest to recent papers 
1341, p5). The papers proposed a decentralized algorithm for 
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online convex optimization which is very similar to ODA- 
PS in a sense that they also introduce online subgradient 
estimations in primal p4) or dual p5) space into information 
aggregation using push-sum. In these papers, the agents share 
a common decision set in the objective functions are sepa¬ 
rable across the agents at each time (i.e., /t(x) = /t (^) 

for all t), and the regret is analyzed in terms of each agent’s 
own copy of the whole decision vector x £ Moreover, 
an additional assumption is made in p4) that the objective 
functions are strongly convex. 

The paper is organized as follows. In Section |II] we for¬ 
malize the problem and describe how the agents interact. In 
Section |I^ we provide an online decentralized dual-averaging 
algorithm in a generic form and establish a basic regret bound 
which can be used later for particular instantiations, namely, 
for the two algorithms ODA-C and ODA-PS. These algo¬ 
rithms are analyzed in Sections IV where we establish 0{\/T) 
regret bounds under mild assumptions. In Section |Vl] we 
demonstrate our analysis by simulations on a sensor network. 


We conclude the paper with some comments in Section VII 


Notation: All vectors are column vectors. For vectors asso¬ 
ciated with agent i, we use a subscript i such as, for example, 
Xi, Zi, etc. We will write to denote the fcth coordinate value 
of a vector x^. We will work with the Euclidean norm, denoted 
by II • II. We will use ei,..., e„ to denote the unit vectors in the 
standard Euclidean basis of K". We use 1 to denote a vector 
with all entries equal to 1, while / is reserved for an identity 
matrix of a proper size. Eor any n > 1, the set of integers 
n} is denoted by [n]. We use cr 2 (A) to denote the 
second largest singular value of a matrix A. 


In this case, we assume that there always exists a self¬ 
loop {i,i) for all agent i G V. Therefore, agent i is 
always contained in its own neighborhood. Also, we use 
di{t) to denote the out degree of node i at time t. i.e., 

d.{t) 4 

We assume S-strong connectivity of the graphs Q{t) 
with some scalar B > 0, i.e., a graph with the following 
edge set 

tB 

£B{t) = U £(i) 

i=(t-l)_B + l 

is strongly connected for every f > 1. In other words, 
the union of the edges appearing for B consecutive time 
instances periodically constructs a strongly connected 
graph. This assumption is required to ensure that there 
exists a path from one node to every other node infinitely 
often even if the underlying network topology is time- 
varying. 

The network interacts with an environment according to the 
protocol shown in Eigure We leave the details of the signal 
generation process vague for the moment, except to note that 
the signals received by all agents at time t may depend on all 
the information available up to time t (including fi,ft, as 
well as all of the local information exchanged in the network). 
Moreover, the environment may be adaptive, i.e., the choice 
of the function ft may depend on all of the data generated by 
the network up to time t. 


II. Problem formulation 

Consider a multiagent system (network) consisting of n 
agents, indexed by elements of the set V = [n]. Each agent 
i G V takes actions in an action ^ace X, which is a closed and 
bounded interval of the real line{HAt each time, the multiagent 
system incurs a time-varying cost ft, which comes from a fixed 
class B of convex functions / : X" —K. 

The communication among agents in the network is gov¬ 
erned by either one of the two following models: 

(Gl) An undirected connected graph Q — {V,£): If agents 
i and j are connected by an edge (which we denote 
hy i GG j), then they may exchange information with 
one another. Thus, each agent i G V may directly 
communicate only with the agents in its neighborhood 
Ni = {j G V '■ i GG j} G) {i}. Note that agent i is always 
contained in its own neighborhood. 

(G2) Time-varying digraphs Q{t) = {V,£{t)), for f > 1: If 
there exists a directed link from agent j to i at time 
t (which we denote by (j, i)), agent j may send its 
information to agent i. We use the notation A/’“(f) and 
to denote the in and out neighbors of agent i at 
time t, respectively. That is, 

mt)^{j\ij,i)G£{t)}G{i}, 

'Everything easily generalizes to X being a compact convex subset of a 
multidimensional space we mainly stick to the scalar case for simplicity. 


Parameters: base action space X; network graph Q — (V,f); 
function class 

For each round t — 1,2,.. 

(1) Each agent i G V selects an action x^{t) G X 

(2) Each agent i G V exchanges local information with its 
neighbors A/) 

(3) The environment selects the current objective ft G T, and 
each agent receives a signal about ft 


Fig. 1. Online optimization with global objectives and local information. 

Let us denote the network action at time t by 

x(f) = ),...,a;”(f)) G X". (1) 

We consider the network regret R{T) at an arbitrary time 
horizon T > 1: 


T T 



t—1 ^ t—1 


Thus, R{T) is the difference between the total cost incurred 
by the network at time T and the smallest total cost that could 
have been achieved with a single action in X" in hindsight (i.e., 
with perfect advance knowledge of the sequence /i,..., fx) 
and without any restriction on the communication between the 
agents. The problem is to design the rule (or policy) each agent 
i G V should use to determine its action x^ft) based on the 
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local information available to it at time t, such that the regret 
in is (a) sublinear as a function of the time horizon T and 
(b) exhibits “reasonable” dependence on the number of agents 
n and on the topology of the communication graphs. 

The regret in Q is dehned over the true network actions 
of individual agents, i.e., a;®(f)’s, rather than in terms of 
some estimates of x(<) by individual agents. This notion 
of regret, which, to the best of our knowledge has been 
hrst introduced in pO) , is inspired by the literature on team 
decision theory and decentralized control problems: The online 
optimization is performed by a team of cooperating agents 
facing a time-varying sequence of global objective functions 
ft, which are nondecomposable (in contrast to decomposable 
objectives where fl is only revealed to agent i). 

Communication among agents is local, as dictated by the 
network topology, so no agent has all the information in order 
to compute a good global decision vector x(f). By comparing 
the cumulative performance of the decentralized system to the 
best centralized decision achievable in hindsight, the regret 
in ^ captures the effect of decentralization. It also calls for 
analysis techniques that are different from existing methods in 
the literature. 

III. The basic algorithm and regret bound 

We now introduce a generic algorithm for solving the 
decentralized online optimization problem dehned in Section 
[n] The algorithm uses the dual-averaging subgradient method 
of Nesterov as an optimization subroutine. 

Each agent z S V generates a sequence in 

X” X K", where the primal iterates 

x*(f) = G X" 

and the dual iterates 

are updated recursively as follows: 


Euclidean norm || • |j, i.e., for any x, y € X" we have 

i/’(y) > i/’(x) -f (Vi/>(x),y - x) + ^l|x - yf, (5) 

where Wip denotes an arbitrary subgradient of ip. 

The dual iterate z.ft) computed by agent i at time t will 
be an estimate of the “running average of the subgradients” 
as seen by agent i, and will constitute an approximation 
of the true centralized dual-averaging subgradient update of 
Nesterov’s algorithm. The messages from Aft entering into the 
dual-space dynamics are crucial for mitigating any disagree¬ 
ment between the agents’ local estimates of what the network 
action should be. The primal iterate Xi(f) of agent i at time 
t is an approximation of the true centralized primal point for 
the subgradient evaluation. 

Note that in ( [^ the local update Ui{t) based on the signal 
about ft affects affects only the zth coordinate of the dual 
iterate Zi{t + 1), while all other coordinates with k i remain 
untouched except for the averaging. The action of agent i at 
time t is then given by 

x\t) = x\{t), 


i.e., by the zth component of the vector Xi(f). 

A concrete realization of the algorithm p^-([Tb|) requires 
specihcation of the rules for computing the local update Ui[t), 
the messages exchanged by the agents, and the mappings 
and Gi t. In this paper, we present two different instantiations 
of this algorithm, namely, the circulation-based method in¬ 
spired by pland the push-sum based method inspired by Q, 
| fT5| , |T6), 132). We call these algorithms ODA-C (Online Dual 
Averaing with Circulation-based communication) and ODA- 
PS (Online Dual Averaing with Push-Sum based communica¬ 
tion) and detail them in Section IV and|V] respectively. 

We now present a basic regret bound that can be used 
for any generic algorithm of the form (3ai-(3bi under the 
following assumption: 

Assumption 1: All functions / S are Lipschitz continu¬ 
ous with a constant L: 


+ 1) = -f (mi(f)), kG[n] (3a) 

Xi(f -f 1) = (3b) 

with the initial condition Zi{0) = 0 for all i G V. In the 
dual update 1^ , 5^ is the Kronecker delta symbol, rt > 0 
is a positive weight parameter, Ui{t) C K is a local update 
computed by agent i at time t based on the received signal 
about ft, mi(<) are the messages received by agent i at time t 
[from agents in A/) under the model (Gl) or from under 

the model (G2)], and k G [n], are real-valued mappings 
that perform local averaging of mi(f). In the primal update 
@, Gi^t ■ R" —>■ K" is a mapping on dual iterates, {Q;(f)}j^g 
is a nonincreasing sequence of positive step sizes, and the 
mapping fl^^ : M" x (0, oo) —>■ X" is dehned by 



where '0 : X" —> IR+ is a nonnegative proximal function. 
We assume that ip is 1-strongly convex with respect to the 


|/(x)-/(y)| <L||x-y|| forallx,yeXA 

Theorem 1: Let {xi(f)}^]^ C X", z S V, be the se¬ 
quences of the agents’ primal iterates, let {u(f)}^]^ with 
u{t) = {ui{t ),..., Un(t)) be the sequence of the agents’ local 
updates, and let {x(f)}“]^ C X" be generated as 

x(t -f 1) = nu(s), a(f)^ . (6) 

Then, under Assumption the network regret R{T) in (j^ 
can be upper-bounded in terms of u(f) and x(f) as follows: 
for each T > 1, 

T 

R{T)<\^a[t-l)\\n{t)r+-^ 
z aG ) 

' -V-' 

(El) 

T n T 

+ + VnDxYjW^- u(t)||, 

t=l i=l t=l 

'-V-^ '■-V-" 

(E2) (E3) 
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where Dx = sup^. y^x \^~y\ diameter of the set X, and 

c = sup^gx- \'^ix)\- 

Remark Since ^ is a continuous function on the compact set 
X", C < oo by the Weierstrass theorem. 

Proof: For any t and any y € X" we can write 

/t(x(f)) - /t(y) 

= /t(x(i)) - /t(x(f)) + /t(x(f)) - /t(y) 

< (V/t(x(t)), x(f) - x(f)) + (V/t(x(f)), ±{t) - y) 

< L|lx(f) - x(f)|| + (V/t(x(f)),x(f) - y), (7) 


where the second step follows from convexity of ft, while the 
last step uses the fact that all / S are L-Lipschitz. Recalling 
that x(f) is the network action vector (see Q), we have the 
following for the first term in Q: 


l|x(f) 


x(t)|| 


{x\t) - xft)) e, 

2=1 


2=1 


( 8 ) 


where the equality follows from the definition of x(f) in ([^ 
and x(f) = (a:^(f),..., x'^{f)). 

The second term in 0 can be further expanded as 

(V/t(x(t)),x(f) -y) 

= (u(f),x(f) - y) + (V/t(x(f)) - u(f),x(f) - y). (9) 


Now, from relation (|^ we obtain 


x(i + 1) = argmin 





Therefore, by Lemma 3], we can write 


T 


'^{u{t),x{t) - y) 

t=i 


1 

< - 
- 2 


T 

l)||u(f)f + 

t=l 


V’(y) 

a{Ty 

( 10 ) 


For the second term on the right-hand side of ([^, we have 


(V/t(x(f)) -u(f),x(t) -y) 

< l|x(f) -y|||!V/t(x(f)) - u(f)|| 

< V^I?x||V/t(x(f)) - u(f)||. (11) 


Combining the estimates in Eqs. 0-([TT]l and taking the 
supremum over all y S X", we get the desired result. □ ■ 

Theorem [T] indicates that the regret will be small provided that 
(El) The squared norms ||u(f)|p remain bounded. 

(E2) The agents’ primal variables Xi(t) do not drift too much 
from the centralized vector x(f). 

(E3) The vectors u(f) stay close to the gradients V/t(x(f)). 
This theorem plays an important role in the sequel, since it 
provides guidelines for designing the update rule Ui{t) and the 
mappings Ffi(-) and ((•). We will also see later that the 
centralized vector x(f) represents a “mean field” of the primal 
iterates Xi{t) for i S V at time t. 


IV. ODA-C AND ITS REGRET BOUND 

We now introduce a decentralized online optimization algo¬ 
rithm which uses a circulation-based framework for its dual 
update rule ( [^ . We refer to this algorithm as ODA-C (Online 
Dual Averaing with Circulation-based communication). ODA- 
C uses the network model (Gl) for its communication. 

A. ODA-C 

Eix a vector r = (ri,...,r„) of positive weights and a 
nonnegative nxn matrix M, such that M^j y 0 only if j (z Ni, 
satisfying the following symmetry condition: 

TiMij = rjMji, i,j e V. (12) 

Then, ODA-C uses the following instantiation of the update 
rules in (|^-([Tbll: 

z^{t + l) = -5^uyf) + zKt) 

Tt 

n 

+ H e [n] (13a) 

Xi(f H-1) = n'^„ (zi(f-f l),a(f)), (13b) 

where ..., € K" represents a vector of 

messages transmitted by agent j to agent i, provided that 
j G Ni- Since i G Ni, we may include the previous dual iterate 
Zi{t) and the outgoing messages v^^yf) in mi(f). The dual 
update rule ( |13a| i is inspired by the state dynamics proposed 
by Li and Marden ||Tj, whereas the primal update rule ( |13b[ ) 
is exactly what one has in Nesterov’s scheme p^ . 

To complete the description of the algorithm, we must spec¬ 
ify the update policies {ui(f)} and the messages 
We assume that all agents receive a complete description of 
ft- Agent i then computes 

Uiit) = (V/t(xi(<)),ei), i G [n], f > 0. (14) 

and feeds this signal back into the dynamics p3a| l. Note, 
however, that the execution of the algorithm will not change if 
the agents never directly learn the full function ft, nor even the 
full gradient V/t(x(f)), but instead receive the local gradient 
signal V/t(xi(f)). The messages take the form 

= zf{t) ( 15 ) 

for all t and all agents i,j gV with j G Ni- 


B. Regret of ODA-C with local gradient signals 

Let z{t) = , 2 "(f)). Our regret analysis rests on 

the following simple but important fact: 

Lemma 1: The weighted sum 

n 

z(t) = y^jiZi{t) 

i=l 

evolves according to the linear dynamics 

z{t + 1) = z{t) + u{t), (16) 

where u(f) = {ui{t ),... ,w„(f)). 
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Remark We observe that the relation in ( [T6| ) holds regardless 
of the choices of decisions Moreover, 

we point out that if u{t) = V/((x(f)), then the combination 
of ( [T6 ] i and ( |13b| i will reduce to a centralized online variant 
of Nesterov’s scheme izi- 

Proof: Let V^{f) denote the n x n matrix with entries 

Then 

n 

= z^{t) + uu{t)+ir[MV'^{t)], 

where M is an n x n matrix with entries Mij = ViMij. 
Since M is a symmetric matrix, by GH, and V^{t) is skew- 
symmetric, tr[ML*(f)] = 0, so we obtain □ ■ 

Lemma[T]indicates that the vector z(f) can be seen as a “mean 
field” of the local dual iterates Zi(f) for z G V at time t. Also, 
if we define 


n I ^ n 

i=i j=i 


< nlf. 

It remains to estimate term (E3) in Theorem [T] To that end, 
we write 


||V/t(x(f)) -u(f)|| 

n 

71 

<^||V/*(x(f))-V/t(x,(f))|| 

n 

<G'^||x(f)-Xi(f)|i, 

i=l 

where we have exploited the fact that the gradients of all / G 
are G-Lipschitz. 

Now, by construction, 

llx(f) -x,(f)|| 

= n5^„(z(<),a(f- 1)) -n^„(z,(f),a(f- 1)) 

< a(f- l)||z(f) -z,(f)||, 


x(f-f 1) = U'^„{z{t + l),a{t)), 

then from relation ( [Tbl l we have 

\s^0 

which coincides with relation in Theorem This allows 
us to make use of Theorem [T] in analyzing the regret of this 
algorithm. Furthermore, the definition of x(f) and relation ( [l4| ) 
indicate that u(f) will stay close to the centralized gradient 
V/t(x(t)), and as a consequence, the errors (El) and (E3) in 
Theorem [T] will remain small. 

We now particularize the bound in Theorem [T] to this 
scenario under the following additional assumption; 

Assumption 2: All functions / G are differentiable and 
have Lipschitz continuous gradients with constant G: 



||V/(x)-V/(y)|l <G|lx-y||, V/ G x,y G X". 


Theorem 2: Under Assumptions [T]-[2] the regret of any 
algorithm of the form ( 13ai-( T3b| , and with u(t) computed 
according to Gl, can be upper-bounded as follows; 




C 

W) 


+ {L + ^/nGDx) Y X! ^ 


t=i 


Proof: The terms on the right-hand side of the bound in 
Theorem [T] can be further estimated as follows. Since each 
/t G is L-Lipschitz, 




<Y\\^MMt))f 

i=l 


where the last step follows from the fact that the map 


z nx„(z,a) is a-Lipschitz (see, e.g., |33 Lemma 1]). 
Substituting these estimates into the bound in Theorem [T] we 
get the result. □ ■ 

This bound indicates that, if the network-wide disagreement 
term behaves nicely, the regret R{T) will be sublinear in T 
with a proper choice of the step size a{t). We illustrate this 
more specifically in the following corollary. 

Corollary 1: Suppose that the policies for computing 
{ui{t)} and {vf_^j{t)} are such that, for all t and for any 
sequence fi,. ■., fr & R, 


^||z,(f)-z(f)|| <K 


for some finite constant K > 0 (which may depend on n 
and on other problem parameters). Then, the regret of the 
algorithm ( |13a| i-( T3b] i is bounded by 

n T 


R{T) < 


nL^ 


-f K (^L -f -^/nGDx) 


1 ) 




c 


In particular, if we choose a{t) = t > 0, then the 

regret is of the order 0{VT): 

R{T) < [nL^ + 2K{L + y^GDx)] Vt + Cs/T+l. 


C. Full regret analysis 

We now show that the network-wide disagreement term is 
indeed upper-bounded by some constant. We recall that f=- 
0 only if j G Afi- In addition to this, we posit the following 
assumptions on the pair (r, M). 

Assumption 3: The positive weights ri,..., r„ sum to one; 

71 

Y^ ?'i = 1 and Ti > 0 for each i € [n]. 

2 = 1 
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The matrix M is row-stochastic, i.e., 

n 

Mij = 1 for each id [n]. 

The conditions we have imposed on the pair (r, M) are 
equivalent to saying that M is the transition probability matrix 
of a reversible random walk on Q with invariant distribution 


< ^^* ||f-(r,f)lHg. 


y]iiz,(f)-z(f)f < 


nL^ 


r3(l - Vl - A)2 


for every f > 1, where 


If||.= 




denotes the spectral gap of M |38|, i.e., 

.„t IWI-JIf 

(r,f)=0 ||f||^ 


have 


2=1 




(f) = - 

r*7. < ^ 


Tk 


s=0 


Moreover, by the definition of z(t) in Eq. ([T^, we have 

1 


z’^{t) = — y^rkUkjs). 

ri. • ^ 


Tk 


s=0 


Note that Vk = (r,efc). From (20i and pT]), we have 


|M*f- (r,m|P < — ||M‘f- (r,f)l|P 


Therefore, 




r = (ri,... ,r„) ||^. Let 

= (zi (f), ..., Z^{t)), k e [n], t > 0, (17) 

and r* = mini<i<„ r^. We state the following bound for 

Lemma 2: Under Assumptions [T] and for the policy in 
([T4li-([T3 we have 


< 


< 


- (r,efc)l| 
(1-A) 


t—S—1 


|efc - (r,efc)l| 


(1-Ar 


- rk) 


(1-A) 


t—S—1 


(23) 


From relations ( |2^ and ( |2^ , we obtain 


t-i 


s=0 

< 


\uk{s)\ 


is the r-weighted (' 2 -norm of the vector f S K", and where A 


Proof: From the definitions of Zi(f),z(f), and z^(f), we 


y] iiz,(f) - m? = E ( 18 ) 


r3(l-yr^)2’ 
where Assumption is used in the last inequality. From this 
and relation ( [T8| ), we obtain 

1 111 - ^3(i_yr^)2’ 

which proves the stated result. □ ■ 

Lemma [^captures the effect of the underlying network topol¬ 
ogy via the spectral gap A (also known as the Fiedler value), 
which captures the algebraic connectivity of the network. 
Since Q is assumed to be connected, A > 0. 

By combining Theorem and Lemma we can now 
provide a regret bound for ODA-C: 

Theorem 3: Let Assumptions [TJ|^ hold. With the choice 
a(i) = 7^ for all t > 0, and under the policy ([T4li-([T5]l, the 
distributed algorithm ODA-C achieves the following regret; 


Thus, we upper-bound the quantity on the right-hand side. 
From ([BJ, we can rewrite the dynamics ( plal i as follows: 

+ 1) = Mz^{t) +—Uk{t)ek, (19) 

rk 

where z^(f) is defined in ( [T7| ). By unrolling the dynamics ( [Tg] ) 
and ( [T6 ] i from time 0 to t and recalling that Zi(0) = 0 for all 
i, we obtain; 


R[T) <nL^ [1 + 


.3/2 


(I-FT^) 




pcVtTi, 

Proof: By Lemma the averaging policy ( [T5] l satisfies 

nL^ 




< 


-3(i_yr^)2- 


Hence, by Jensen’s inequality. 


( 20 ) 


El|z*(^)-z(f)|| < 


ny]||z,(f) -z(f)||2 

\ i=i 


( 21 ) 


< 


nL 


z^{t) - ® - (r,efe)l|| |ufe(s)|. 

( 22 ) 

By the properties of Markov matrices |[3^, for any f G K", 


r^yi-v^r^)' 

Therefore, the conditions of Corollary hold with 
K= _ — _ 

vT-d-VT^)' 

and the stated result follows. □ ■ 

This shows that, for any fixed communication network Q 
satisfying Assumption the worst-case regret is bounded by 
0{VT). The constants also capture the dependence on the 
algebraic connectivity of the network via the spectral gap A, 
as well as on the network size n. 
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V. ODA-PS AND ITS Regret Bound 

We now introduce another decentralized online optimization 
algorithm which uses the push-sum communication protocol 
for its dual update rule ( [^ . We refer to this algorithm 
as ODA-PS (Online Dual Averaing with Push-Sum based 
communication). ODA-PS uses the network model (G2) for 
its communication. 


A. ODA-PS 


For ODA-PS, each agent i maintains an additional scalar 
sequence {wi{t)}'^i C K. Then, this algorithm particularizes 
the update rule in ([^-([Tbjl as 

n 

w^{t + l)= (24a) 

7 = 1 

n 

+ 1)= nS^Uiit) + k e [n] (24b) 

x.(t+l)= n*. (24c) 


where the weight matrix A{t) is defined by the out-degrees of 
the in-neighbors, i.e.. 



whenever j G 
otherwise. 


(25) 


The matrix A(t) is column stochastic by construction. 

Note that the above update rules are based on a simple 
broadcast communication. Each agent i broadcasts (or pushes) 
the quantities Wi{t)/diit) and Zi{t)/di{t) to all of the nodes in 
its out-neighborhood Then, in ( |24a| l-( |24b| l each agent 

simply sums all the received messages to obtain Wi{t -f 1) 
and Zi{t -f 1). The update rule ( |24c| l can be executed locally. 
Unlike ODA-C, the averaging matrix A{t) in ODA-PS does 
not require symmetry due to this broadcast-based nature of the 
push-sum protocol. However, the asymmetry requires unifor¬ 
mity of the positive weights across all agents (cf. Eq. (|^). 
Here we simply use = 1/n. 

To complete the description of the algorithm, we must 
specify the update policies {ui{t)}. As in ODA-C, we assume 
that the signal agent i gets from the environment at time t is 
simply the 7-th coordinate of the gradient of ft at the agents 
primal variable Xi(f). Thus, we define; 


Ui{t) = (V/t(xj(f)),ei), i G [n], f > 0, (26) 


Also, we denote 

A{t — 1 : t) = I, for all t > 1. 


B. Regret of ODA-PS with local gradient signals 

Eor the regret analysis, we first study the dynamics of the 
dual iterates zt{t) and its “mean field” z(t) in the following 
lemma. We remind that z{t) = , z'^it)) and 

= {z'l{t),...,z^{t)), k G [n]. 

Lemma 3: Let Zi{0) = 0 for all 7 G V. 

(a) The weighted sum 

n 

z{t) = 

evolves according to the linear dynamics 
z{t -I- 1) = z(f) + u(f), 


where u{t) = (ui(f),... 


(b) Eor any i,k G [n], the iterates in (24bi evolve according 
to the following dynamics 


t-i 


^iit) =n^[A{t- 1 : s-f l)],fc77fc(s). 


s=0 


Proof: 

(a) Erom relation (|24b[), we have for all k G [n] 






= -E 


nSfu,{t) + Y^[A{t)]ijZ^{t) 


7 = 1 


. n n 

= n,{t) + -Y,z^{t)J2[Am 


7 = 1 i=l 


= Ukit) + z’^it), 


where the last equality follows from the column- 
stochasticity of the matrix A{t). The desired result fol¬ 
lows by stacking up the scalar relation above over k. 

(b) By stacking up the equation (24b i over i, we have for all 
f > 1 and k G [n] 

z^it + 1) = A{t)z^{f) + nuk{t)ek. 


i.e., the update performed by agent i at time t is the simply 
the 7-th coordinate of the gradient of ft at the agent’s primal 
variable Xi(f). 

We assume that each agent i initializes its updates with 
Wi{Q) = 1 and Zi{0) = 0 , while 77 ^( 0 ) can be any arbitrary 
value in X. We also recall that the local action of agent i at 
time t is given by the ith coordinate of Xi(f), i.e., 

x\t) = x\{t). 

Eor notational convenience, let us denote the products of the 
weight matrices A{t), ..., A(s) by Aft : s), i.e., 

A{t : s) = Aft) ■ ■ ■ A(s) for all f > s > 0. 


By unrolling this equation from time 0 to t, we obtain 
z'^ft) = A{t-l: 0)z'=(0) 

t-1 

+ n E Uk{s)A{t -1:5 + l)e/, 

t-1 

= n'^Uk{s)A{t - 1 : 5 + l)efc, 

s^O 

where the equalities follows from Aft — 1 : t) = I and 
the initial condition Zi(0) = 0 for all 7 G V. We get the 
desired result by taking the 7-th component of this vector. 

□ 










Lemma 1^ tells us that the vector z(<) acts as a “mean held” 
of the dual iterates Zi{t). Also, if we dehne 

x(f + 1) = (z(f + 1), a{t)), 

then from Lemma a) we can see that 

x(f + 1) = nu(s), a(f)^ , 

which coincides with relation ([^ in Theorem [T] 

We now particularize the bound in Theorem [T] in this 
scenario under the additional assumption on the Lipschitz 
continuous gradients (Assumption [^n Section 113. 

Theorem 4: Under Assumptionsn|^ the regret of the al¬ 
gorithm (24ai-(24ci with the local update Ui{t) of agent i 


computed according to ( 261 can be upper-bounded as follows: 
for all T > 1 , 


„ r2 T 

R{T)< —Y^a{t-l) + 


C 

W) 


+ {L + V^GDx) 


2 = 1 


Zi(i) 


Wi{t) 


-z(f) 


Proof: Since the dehnition of u(f) in ODA-PS (cf. Eq. 
( |26l l) coincides with that in ODA-C (cf. Eq. ([14 we can 
reuse all the derivations in the proof of Theorem |2 except for 
the network-wide disagreement term: 


|x(f) -Xi(f)|| 

n;^„(z(f),a(f-i))-nt 

Zi(i) 


Zjjt) 


Y{t — 1) 


< a{t-l) 


z{t) - 


Wi(f) 


(27) 


where the last inequality follows from the a-Lipschitzian 
property of the map z i—y H^„{z,a) |33 Lemma 1]. □ ■ 


This bound tells us that the regret R(T) will be sublinear in T 
with proper choice of the step size a{t) if the network-wide 
disagreement term behaves nicely. Note that we can also make 
use of Corollary here if we can show 


i=l 


z(t) - 


Zi{t) 




< K, 


for some constant K > 0. 


C. Full regret analysis 

We now show that the network-wide disagreement term in 
Theorem is indeed upper-bounded by some constant. Eor 
doing this, we first restate a lemma from 

Lemma 4: Let the graph sequence be i3-strongly 

connected. Then the following statements are valid. 

(a) There is a sequence {(j){t)} C M" of stochastic vectors 
such that the matrix difference A{t : s) — (j){t)l' for t > s 
decays geometrically, i.e., for all i,j G [n]. 

\[A{t : s)]ij — <?!>i(f)| < for all f > s > 0, 


where we can always choose 
/3 = 4, 

If in addition each Q {t) is regular, we may choose 
13 = 2 ^ 2 , 0 = (l-l/4n3)i/-s, 

or 

P = V 2 , 0 = mAxa2{A{t)), 

whenever supoo cr 2 (A(f)) < 1 . 

(b) The quantity 


7 = inf min \A(t : 0)1], 
t>0Vl<i<n 


satishes 


7 > 


1 

-,nB ' 


Moreover, if the graphs Q{t) are regular, we have 7 = 1 . 
The next lemma provides an upper-bound for 

4*1 


i:r=i 

Lemma 5: Let the sequences {zi(t)} and {rt;i(f)} be gen¬ 
erated according to the algorithm ( |24a[ )-( |24b| . Recall that 
z(t) = 7 X]r=i ^*(0- Then, we have for all f > 1, 


E 

2=1 


z*(f) 


Wi{f) 


-z{t) 


< 


2/?L 


70(61-1) 


where the constants /3, 7 and 0 are as dehned in Lemma 

Proof: From the dehnitions of Ziit), z{t) and z^{t), we 

have 

2 


E 


Zi(f) 




-z(t) 


= EE 

2=1 k—1 




Viif) 




(28) 


Thus, we can upper-bound the quantity on the right-hand side. 


By inspecting equation (24ai, it is easy to see that for any 
i GV and f > 1 , we have 

n n 

Wiit) = '^[A{t - 1 : 0)]i£'u;i(0) = '^[A{t - 1 : Q)\u. 

t=i 

From this and Lemma we have the following chain of 
relations: 




zHt) 

Wi{t) 




-^Wfc(s) 


S=0 


= ^Mfc(s) 


EtMit - 1 : s + l)]zfe - - 1 : 0 )U 


^ ( \ I 3^=1 ([^(^ — 1 : s -I- l)]jfc — (pi(t — 1)) 

- ^ i -E?.JA«-i:o)k- 


s=0 

t-1 


s=0 


7 


( 29 ) 
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where the inequalities follow from adding and subtracting 
4>i(t — 1) and from LemmaFrom relation (26i, we have 

\Ms)\^ = |(V/,(xfc(s)),efc)p < ||V/,(x,(s))f < L^. 


Combining this and the fact that l39* ® 2 > 1 

s = 0 ,..., f — 1 , we further have 


= (f) 


^{t) 




< - - 


2/3L 


s=0 


76 »( 6 »- !)■ 


Substituting this estimate in relation ( |28] l, we get the desired 
result. □ ■ 

By combining Theorem]^ and Lemmawe can now provide 
the regret bound of ODA-PS; 

Theorem 5: Let Assumptions [US hold. With the choice 
a{t) = for all t > 0 , and under the policy ( |26| l, the 

distributed algorithm ODA-PS achieves the following regret; 


R{T) < nL^ 



+ cVt +1, 


Afjy/n \ 

7^(1-0)7 


Vt 


where the constants /3, 7 and 0 are as defined in Lemma 
Proof: By Jensen’s inequality, we have 


E 


Zi(f) 

w^{t) 



< 



Zjjt) 

w,{t) 



Hence, using Lemma we can estimate the network-wide 
disagreement term as follows: 


E 


Zjjt) 

m{t) 



< 



2/3L 

76 »( 6 »- 1 ) 


761(0- !)■ 


2 


Thus, the conditions of Corollary [T] with this modified 
network-wide agreement hold with 


K = 


2I3L 


70(0-1)' 

and the stated result follows. □ ■ 

The bound shows that, for any time-varying sequence of B- 
strongly connected digraphs, the worst-case regret of ODA-PS 
is of order 0{\/T). The constants also capture the dependence 
on the properties of the underlying network, i.e., the number 
of nodes n and as well as the connectivity period B. 


VI. Simulation Results 

Consider the problem of estimating some target vector 
X S RP using measurements from a network of n sensors. 
Each sensor i is in charge of estimating a subvector x^ G 
of X, where Pi p and p = some very large 

number. An example includes the localization of multiple 
targets, where in this case x S becomes a stacked vector 
of all target locations. When there are a number of spatially 
dispersed targets, we can certainly benefit from distributed 
sensing. 



Fig. 2. Time-varying communication topology changing in cycle of thi'ee 
used for ODA-PS 


The sensors are assumed to have a linear model of r(x) = 
Ax, where A G R'"^^ and m < At each time t, each 
sensor i G V estimates its portion Xi(f) G R^‘ of the target 
vector X G R^, and then takes a measurement g) G R^S which 
is corrupted by observation error and possibly by modeling 
error. We assume all sources of errors can be represented as 
an additive noise, i.e., 

qt = Ax(t) + (t, 

where qt G R™ with m = ^ stacked vector of 

all ql’s and Ct A[{0,P), where P is the noise covariance 
matrix. 

The regret is computed with respect to the least-squares 
estimate of the target locations at time T, i.e., 

T 

X = argmin V/t(x), 

-eX" t=i 

where /t(x) = |||Ax — qt|p. and we set X G [—20, 20]. 

For ODA-C, we experiment with a n = 5 node cycle graph 
whose communication topology is given as: 

1G72G73G74G75G71 

We set ri = 1/5, Mu = 1/2 for all i, and Mij = 1/4 \fi gg j. 
For ODA-PS, we experiment with a time-varying sequence of 
digraphs with n = 5 nodes whose communication topology 
is changing periodically with period 3. The graph sequence 
is, therefore, 3-strongly connected. In Figure we depict 
the repetition of the 3 corresponding graphs. The averaging 
matrices A{t) (cf Eq. ( |25] )) can be determined accordingly. 
We ran our algorithms once for each T G [1000]. That is, 
for a given T, the iterates in the algorithms are updated from 
f = 1 to f = T. We used step size a{t) = for both 

algorithms. 

In Figure]^ we depict the average regret R{T)/T over time 
T of the distributed sensing problem when ODA-C and ODA- 
PS are used, respectively. It shows that the regret is sublinear 
for both algorithms and the average R{T)/T goes to zero as 
the time increases. 


VII. Conclusion 

We have studied an online optimization problem in a 
multiagent network. We proposed two decentralized variants 
of Nesterov’s primal-dual algorithm, namely, ODA-C using 
circulation-based dynamics for time-invariant networks and 

^ Although target localization is usually formulated as a nonlinear estima¬ 
tion problem |39| , for considerations of simplicity one often employs a lin¬ 
earized model using a first-order Taylor expansion around the mea.surements; 
see, e.g., |40| , 0 
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Fig. 3. The Average Regret R{T)/T vs. Iterations for Online Distributed 
Active Sensing using ODA-C (left) and ODA-PS (right) 

[ 20 ] 


ODA-PS using broadcast-based push-sum protocol for time- 
varying networks. We have established a generic regret bound [21] 
and provided its refinements for certain information exchange 
policies. The regret is shown to grow as 0{VT) when the ^ 22 ] 

step size is a{t) — \/^t + 1. For ODA-C, the bound is 
valid for a static connectivity graph and a row-stochastic 
matrix of weights M = [My ] which is reversible with respect 
to a strictly positive probability vector r. For ODA-PS, the 
bound is valid for a uniformly strongly connected sequence 
of digraphs and column-stochastic matrices of weights A{t) 
whose components are based on the out-degrees of neighbors. [25] 
Simulation results on a sensor network exhibit the desired 

[26] 

theoretical properties of the two algorithms. 
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